Journal of Medical Internet Research
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Eye tracking is recognized as a gold standard for measuring visual attention and cognitive engagement. In this study, it offers a useful lens for understanding how primary care providers balance patient communication with navigation of electronic health records (EHRs). We used wearable eye tracking to collect visual information processing behavior and conducted a retrospective think-aloud protocol to examine how primary care clinicians processed suiciderelated information (CAT-MH(R)) embedded in...
Show abstract
BackgroundTyping in the electronic health record (EHR) takes up healthcare providers time and cognitive space and constitutes a substantial administrative burden contributing to high burnout rates in healthcare. Ambient digital scribes may improve this problem. ObjectiveTo investigate the effect of the use of Autoscriber, an ambient digital scribe, on healthcare providers administrative workload and the quality of medical notes in the EHR. MethodsA study period of 26 weeks was randomized into ...
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of t...
Show abstract
BackgroundGenerative artificial intelligence (GenAI) in healthcare may reduce administrative burden and enhance quality of care. Large language models (LLMs) can generate draft responses to patient messages using electronic health record (EHR) data. This could mitigate increased workload related to high message volumes. While effectiveness and feasibility of these GenAI tools have been studied in the United States, evidence from non-English contexts is scarce, particularly regarding user experie...
Show abstract
BackgroundTinnitus affects a substantial proportion of the global population and can severely disrupt sleep, mood, and daily functioning, yet the quality of mobile health apps designed for tinnitus management remains highly variable. Traditional evaluation methods, including clinical trials, expert rating scales, and small-scale surveys, rarely capture large-scale, feature-level feedback from real-world users, leaving a gap in understanding which app characteristics drive sustained engagement an...
Show abstract
Large language models (LLMs) are increasingly used by the public to seek health information, yet their reliability in addressing common vaccine myths remains unclear. We conducted an exploratory multi-vendor evaluation of three LLMs (GPT-5, Gemini 2.5 Flash, Claude Sonnet 4) using officially curated vaccination myths from Germanys public health institution and two realistic user framings as prompts: a curious skeptic and a convinced believer. All model responses were independently evaluated by t...
Show abstract
Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare...
Show abstract
Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. Th...
Show abstract
BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ing...
Show abstract
We developed and validated a self-administered clinical vignette platform powered by a large language model (LLM), deployed through a SurveyCTO web survey, to measure primary health care provider competencies in Vietnam. In a pilot focus group, nine physicians rated LLM-simulated patient interactions as realistic (mean 3.78/5) and user-friendly. In the validation phase, 22 providers completed 132 vignette interactions across ten clinical scenarios in Vietnamese. Essential diagnostic checklist sc...
Show abstract
BackgroundEHR documentation and chart review contribute to clinician workload and burnout. To alleviate pre-charting burden, Epic has released a new generative AI chart summarizer tool, which has become widely adopted; however, its impact has not been examined in randomized trials. ObjectiveTo evaluate whether access to an Epic generative AI chart summarization tool reduces cognitive task load among ambulatory providers compared with usual care. MethodsTwo-arm, parallel-group randomized contro...
Show abstract
Ambient intelligence-based systems are increasingly used for clinical documentation. To quantify linguistic differences associated with ambient documentation, we conducted a matched pre-post analysis of 6,026 outpatient clinical notes from Mass General Brigham following implementation of two ambient AI documentation systems (Nuance Dragon Ambient eXperience [DAX] and Abridge). Within-clinician comparisons focused on the History of Present Illness (HPI) and Assessment and Plan (A&P) sections and ...
Show abstract
AbstractAccurate health information is ineffective if patients cannot understand it. Large Language Model (LLM) health research values veridical precision; however, linguistic accessibility remains an under-examined component of output quality and usability. This study investigated two sources of variability in readability classification: differences across LLM systems and across readability metrics. The analysis tested 1,120 data points from seven systems in English and Portuguese, comparing ba...
Show abstract
Health behaviors such as physical activity and sleep affect mental health, but the effect of each health behavior varies substantially across individuals, limiting the usefulness of generic behavioral recommendations. We collected one year of continuous wearable and ecological momentary assessment data from 3,139 participants in the Intern Health Study (2018-2023), and examined individual-level associations between wearable-derived features and mood across the internship year. The behaviors asso...
Show abstract
Wearable devices can collect changes in human behaviors related to mental health including depression and anxiety. Here, we examined whether and how digital metrics from a consumer-grade wearable smart ring (Oura Ring) differed by severity of depression and anxiety symptoms using data from a large-scale population-based sample of young adults (n=1,290, age range: 33-35). Participants wore the ring for two weeks, assessing sleep architecture, nocturnal heart rate (HR), heart rate variability (HRV...
Show abstract
BackgroundObjective Structured Clinical Examination (OSCE; Clinical Performance Examination [CPX] in South Korea) is a high-stakes assessment of clinical performance, communication, and reasoning during time-limited patient encounters. As AI-enabled virtual standardized patient (VSP) simulation and automated scoring are introduced for OSCE-like training, prospective evidence is needed on how such systems perform and are perceived when embedded in real educational workflows. MethodsWe developed ...
Show abstract
IntroductionClinicians and patients are likely to increasingly use Large Language Models (LLMs) for diagnostic support. Use of LLMs mostly created in North America and Europe, could lead to a High-Income Country bias if used in Low- and Middle-Income Country (LMIC) healthcare settings. We aimed to explore if diagnostic suggestions made by LLMs are relevant in LMIC settings. MethodsFive short respiratory clinical vignettes were produced. For each vignette, a group of doctors from one of 5 countr...
Show abstract
IntroductionHealthcare organizations have begun incorporating screening procedures for social determinants of health (SDOH) into care, recognizing the impact these factors can have on health outcomes. We aimed to present methods for evaluating redundancy in the risk information gained across SDOH questions and for evaluating whether demographic biases are present in whether patients were asked SDOH questions and whether they declined to answer them. MethodsSDOH question data were analyzed for 1...
Show abstract
BackgroundInterprofessional teams are central to high quality patient care. However, identifying the clinician primarily responsible for a patient requires labor-intensive methodologies. Although electronic health record (EHR) audit logs offer a scalable alternative, its use for identifying frontline clinicians is underdeveloped. ObjectiveTo develop and validate an algorithm utilizing EHR audit logs to identify the primary frontline clinician per patient day of an encounter and to describe care...
Show abstract
Ambient AI documentation tools generate draft notes that clinicians can review and edit before signing off in electronic health records. Scalable computational approaches to characterize how clinicians modify drafts remain limited, yet are essential for evaluating and improving AI effectiveness. We examined the feasibility of a few-shot prompted large language model (LLM) for categorizing sentence-level edits between AI drafts and final documentation. We developed five label-specific binary mode...